Statistics of the Kolkata Paise Restaurant Problem 



Asim Ghosh 1 , Arnab Chatterjee 2 , Manipushpak Mitra 3 and 
Bikas K Chakrabarti 1,3 

1 Theoretical Condensed Matter Physics Division and Centre for Applied 
Mathematics & Computational Science, Saha Institute of Nuclear Physics, 1/AF 
Bidhannagar, Kolkata 700 064, India 

2 Condensed Matter and Statistical Physics Section, The Abdus Salam International 
Centre for Theoretical Physics, Strada Costiera 11, Trieste 1-34014, Italy 

3 Economic Research Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, 
Kolkata 700108, India 

E-mail: asim.ghosh@saha.ac.in, achatter@ictp.it, mmitra@isical.ac.in & 
bikask.chakrabarti@saha.ac.in 

Abstract. We study the dynamics of a few stochastic learning strategies for the 
"Kolkata Paise Restaurant" problem, where N agents choose among N equally priced 
but differently ranked restaurants every evening such that each agent tries get to dinner 
in the best restaurant (each serving only one customer and the rest arriving there going 
without dinner that evening). We consider the learning strategies to be similar for all 
the agents and assume that each follow the same probabilistic or stochastic strategy 
dependent on the information of the past successes in the game. We show that some 
"naive" strategies lead to much better utilization of the services than some relatively 
"smarter" strategies. We also show that the service utilization fraction as high as 
0.80 can result for a stochastic strategy, where each agent sticks to his past choice 
(independent of success achieved or not; with probability decreasing inversely in the 
past crowd size). The numerical results for utilization fraction of the services in some 
limiting cases are analytically examined. 
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1. Introduction 

The Kolkata Paise Restaurant (KPR) problem [H El [3] is a repeated game, played 
between a large number N of agents having no interaction amongst themselves. In 
KPR problem, prospective customers (agents) choose from N restaurants each evening 
simultaneously (in parallel decision mode); N is fixed. Each restaurant has the same 
price for a meal but a different rank (agreed upon by all customers) and can serve only 
one customer any evening. Information regarding the customer distributions for earlier 
evenings is available to everyone. Each customer's objective is to go to the restaurant 
with the highest possible rank while avoiding the crowd so as to be able to get dinner 
there. If more than one customer arrives at any restaurant on any evening, one of them 
is randomly chosen (each of them are anonymously treated) and is served. The rest do 
not get dinner that evening. 

In Kolkata, there were very cheap and fixed rate "Paise Restaurants" that were 
popular among the daily laborers in the city. During lunch hours, the laborers used to 
walk (to save the transport costs) to one of these restaurants and would miss lunch if 
they got to a restaurant where there were too many customers. Walking down to the 
next restaurant would mean failing to report back to work on time! Paise is the smallest 
Indian coin and there were indeed some well-known rankings of these restaurants, as 
some of them would offer tastier items compared to the others. A more general example 
of such a problem would be when the society provides hospitals (and beds) in every 
locality but the local patients go to hospitals of better rank (commonly perceived) 
elsewhere, thereby competing with the local patients of those hospitals. Unavailability 
of treatment in time may be considered as lack of the service for those people and 
consequently as (social) wastage of service by those unattended hospitals. 

A social planner's (or dictator's) solution to the KPR problem is the following: the 
planner (or dictator's) asks everyone to form a que and then assigns each one a restaurant 
with rank matching the sequence of the person in the que on the first evening. Then 
each person is told to go to the next ranked restaurant in the following evening (for the 
person in the last ranked restaurant this means going to the first ranked restaurant). 
This shift process than continuous for successive evenings. Call this solution the fair 
social norm. This is clearly one of the most efficient solution (with utilization fraction 
/ of the services by the restaurants equal to unity) and the system arrives at this this 
solution immediately (from the first evening itself). However, in reality this cannot be 
the true solution of the KPR problem, where each agent decides on his own (in parallel 
or democratically) every evening, based on complete information about past events. In 
this game, the customers try to evolve a learning strategy to eventually get dinners at 
the best possible ranked restaurant, avoiding the crowd. It is seen, the evolution these 
strategies take considerable time to converge and even then the eventual utilization 
fraction / is far below unity. The KPR problem have some basic features similar to 
the minority game problem [U [5] in that diversity is encourage (compared to herding 
behavior) in both, while it differs from (two-choice) minority games in terms of the 
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macroscopic size of the choices. 

As already shown in ref pQ, a simple random-choice algorithm, if adapted by all the 
agents, can lead to a reasonable value of utilization fraction (/ ~ 0.63). Compared to 
this, several seemingly "more intelligent" stochastic algorithms lead to lower utilization 
of the services. Ref. [3] studied a few more such "smarter" algorithms, having 
several attractive features (including analytical estimate possibilities), but still failing 
to improve the overall utilization fraction beyond its random choice value. Here we 
develop a stochastic strategy, which maintains a naive tendency (probability decreasing 
with past crowd size) to stick to any agent's own past choice (successful or not), leading 
to a maximum, so far, value of the utilization fraction / (~ 0.80) in the KPR problem. 
We also estimate here analytically the / values for several of such strategies. 



2. Stochastic learning strategies 



Let the symmetric stochastic strategy chosen by each agent be such that at any time t, 
the probability pk{t) to arrive at the k-th ranked restaurant is given by 
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where n^(t) denotes the number of agents arriving at the k-th ranked restaurant in 
period t, T > is a scaling factor and a > is an exponent. Note that under (TjQ) 
the probability of selecting a particular restaurant increases with its rank and decreases 
with its popularity in the immediate past (given by the number n k {t — 1)). Certain 
properties of the strategies given by ([I]) are the following: 

(i) For a = and T — > oo, Pk{t) — corresponds to the complete random choice case 
for which we know [1] that the utilization fraction is around 0.63, that is on an 
average there is 63% utilization of the restaurants (see appendix A). 

For a = and T — > 0, the agents avoid those restaurants visited last evening 
and choose again randomly from the remaining restaurants p]. With appropriate 
simulation it was shown that the distribution of the fraction / of utilization of the 
restaurants is Gaussian around 0.46 (see subsection 2.2). 



n 



2.1. Rank dependent strategies: 

For any natural number a and T — > oo, an agent goes to the k-th ranked restaurant 
with probability Pk{t) = k a /J2k a ; which means in the limit T — > oo in §Bj gives 
Pk(t) = k a I k a . Let us discuss the results for such a strategy here. 

If an agent selects any restaurant with equal probability p then probability that a 
single restaurant is chosen by m agents is given by 

A(m) = f N j p m (l - p) N ~ m . (2) 
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Therefore, the probability that a restaurant with rank k is not chosen by any of the 
agents will be given by 



N \ ^ vJV k a 



A k (m = 0) = (1 -p fc ) ; p k 
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where iV = Ef=i & Q ^ f" k a alk = Ky. Hence 

A fc (m = 0) = exp f . (4) 

Therefore the average fraction of agents getting dinner in the k-th ranked restaurant is 
given by 

f h = 1 - A fe (m = 0) (5) 



o 

-I — • 

cd 

N 



o 

H — • 

O 
CO 

CD 
CD 
cd 

a> 
> 
cd 



0.8 



0.6 



0.4 



0.2 



a=0 
a=1 
a=2 
a=3 

1-e" 4x 



/ 0.1 
0.08 

^ 0.06 
0.04 
0.02 




h — ~ — i 1 r 




0.3 0.4 0.5 0.6 0.7 0.8 

f 



0.2 0.4 0.6 0.8 

rank of the restaurants (k) 



1 



Figure 1. The main figure shows average fraction of utilization (fk) versus rank of the 
restaurants (k) for different a values. The inset shows the distribution D(f = E fk/N) 
of the fraction / agent getting dinner any evening for different a values. 



and the numerical estimates of fk is shown in Fig. ([I]). Naturally for a = 0, the problem 
corresponding to random choice fk = 1 — e" 1 , giving f — J2 fk/N ~ 0.63 and for a = 1, 
fk = l — e~ 2k / N giving f = fk/N ~ 0.58 as already obtained analytically earlier (see 
appendix B). 
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2.2. Strict crowd- avoiding case 

We consider here the case (see also [3]) where each agent chooses on any evening (t) 
randomly among the restaurants in which nobody had gone in the last evening (t — 1). 
This correspond to the case where a = and T — > in Eq. ([1]). Our numerical 
simulation results for the distribution D(f) of the fraction / of utilized restaurants is 
again Gaussian with a most probable value at / ~ 0.46. This can be explained in the 
following way: As the fraction / of restaurants visited by the agents in the last evening 
is avoided by the agents this evening, the number of available restaurants is N(l — /) 
for this evening and is chosen randomly by all the N agents. Hence, when fitted to Eq. 
I 1A.1I in appendix A), A = 1/(1 — /). Therefore, following Eq. (lA.lj) . we can write the 
equation for / as 



The solution of this equation gives / ~ 0.46. This result agrees well with the numerical 
results for this limit (a = 0, T — > 0). 

2.3. Stochastic crowd avoiding case 

In this section we start with the following stochastic strategy: if an agent goes to 
restaurant k in period (t — 1) then the agent goes to the same restaurant in the 
next period with probability Pk(t) = nk ^_^ an d to any other restaurant fc'(=^ k) with 

probability Pk'(t) = w-i) • ^ n ^ s P rocess ; the average utilization fraction is / ~ 0.8 
and the distribution D(f) is a Gaussian around / ~ 0.8 (see Fig. [2]). 

An approximate estimate of /: Let denote the fraction of restaurants where 
exactly i agents (i = 0, . . . , N) appeared on any evening and assume that ai = for 
i > 3. Therefore, Oo + a\ + a 2 — 1, a\ + 2a2 = 1 and hence ao = 02- Given the strategy, 
CZ2 fraction of agents will make attempts to leave their respective restaurants in the next 
evening (t + 1), while no intrinsic activity will occur on the restaurants where, no body 
came (ao) or only one came (ai) in the previous evening (t). These 02 fraction of agents 
will now get equally divided (each in the remaining N — 1 restaurants). Of these a 2 , the 
fraction going to the vacant restaurants (a in the earlier evening) is 0002. Hence the 
new fraction of vacant restaurants is now ao — aoct2- In restaurants having exactly two 
agents (02 percent in the last evening), some vacancy will be created due to this process, 
and this is equal to ^ — a 2 9 f. Steady state implies that a — a a 2 + — a 2 ^ = a 
and hence using ao = a 2 we get ao = a 2 = 0.2, giving ai = 0.6 and / = a\ + a 2 = 0.8. 
Of course, the above calculation is approximate as none of the restaurant is assumed 
to get more than two costumers on any evening (a» = for i > 3). The advantage in 
assuming ao, ai and 02 only to be non vanishing on any evening is that the activity of 
redistribution on the next evening starts from this a 2 fraction of the restaurants. This 
of course affects ao and a± for the next evening and for steady state these changes must 
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balance. The computer simulation results also conform that Oj < 0.03 for i > 3 and 
hence the above approximation does not lead to serious error. 
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Figure 2. Numerical simulation results for a typical prospective customer distribution 
on any evening. 



3. Evolving Stochastic Strategy 

In this section we assume that agents have two possible exogenously given values of 
a: a — or a — 1. We start by taking some random allocation of a over the set of 
iV agents. The strategy followed by each agent thereafter is the following: if an agent 
starts with an a = 0(1) and fails to get dinner for the successive r evenings then, in 
the next evening , the agent shifts to a = 1(0). The steady state distribution of the a 
values in the population of agents do not depend on the initial allocation of a values 
in the population (see Fig. [3]). However, as in obvious, for large values of r > N, the 
stability of the distribution disappears. 

4. Convergence to a fair social norm with deterministic strategies 

In the KPR problem if the rational agents interact then a fair social norm that can 
evolve is a periodically organized state with periodicity iV where each agent in turn 
gets served in all the N restaurants and all agents get served every evening. Can we 
find deterministic strategies (in the absence of a dictator) such that the society achieves 
this fair social norm? There is one variant of Pavlov's win shift lose stay strategy (see 
[6J I7J |8]) that can be adopted to achieve the fair social norm and another variant that 
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Figure 3. Steady state distribution of successful agents having a = 0. The same for 
a = 1 will be given by just complementary function. 



can be adopted to achieve the fair social norm in an asymptotic sense. Of course, these 
strategies are deterministic in nature. 

4-1. Fair strategy 

The fair strategy works as follows: 

(i) At time (evening) t — 0, agents can choose any restaurants either randomly or 
deterministically. 

(ii) If at time t agent i was in a restaurant ranked k and was served then, at time t + 1, 
the agent moves to the restaurant ranked k — 1 if k > 1 and moves to the restaurant 
ranked N if k = 1. 

(iii) If agent % was in a restaurant ranked k at time t and was not served then, at time 
t + 1, the agent goes to the same restaurant. 

It is easy to verify that this strategy gives a convergence to the fair social norm in less 
than or equal to iV periods. Moreover, after convergence is achieved, the fair social norm 
is retained ever after. The difficulty with this strategy is that a myopic agent will find 
it hard to justify the action of going to the restaurant ranked last after getting served in 
the best ranked restaurant. However, if the agent is not that myopic and observes the 
past history of strategies played by all the agents and can figure out that this one evening 
loss is a tacit commitment devise for this kind of symmetric strategies to work then this 
voluntary loss is not that implausible. Therefore one needs to run experiments before 
arguing for or against this kind of symmetric deterministic strategies. More importantly 
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the fair strategy can be modified to take care of this justification problem provided one 
wants to achieve the fair social norm in an asymptotic sense. 

4-2. Asymptotically fair strategy 

The asymptotically fair strategy works as follows: 

(i) At time (evening) t — 0, agents can choose any restaurants either randomly or 
deterministically. 

(ii) If at time t agent i was in a restaurant ranked k and was served then, at time t + 1, 
the agent moves to the restaurant ranked k — 1 if k > 1 and goes to the same 
restaurant if k = 1. 

(iii) If agent i was in a restaurant ranked k at time t and was not served then, at time 
t + 1, the agent goes to the restaurant ranked N. 

5. Summary and Discussion 

We consider the KPR problem where the decision made by each agent in each time period 
t is independent and is based on the information about the rank k of the restaurants 
and their occupancy given by the numbers rik(t — 1) . . . n&(0). We consider here in 
Sec. 2 several stochastic strategies where each agent chooses the k-th ranked restaurant 
with probability Pk(t) given by Eq. (TI]). The utilization fraction /& of the k-th ranked 
restaurants on every evening is studied and their average (over k) distributions D(f) 
are shown in Fig. [T] for some special cases. From numerical studies, we find their 
distributions to be Gaussian with the most probable utilization fraction / ~ 0.63, 0.58 
and 0.46 for the cases with a — 0, T — > oo; a — 1, T — > oo; and a = 0, T — > 
respectively. For the stochastic crowd-avoiding strategy discussed on Sec. 2.3, we get 
the best utilization fraction / ~ 0.8. The analytical estimates for / in these limits are 
also given and they agree very well with the numerical observations. 

Finally, we suggest ways to achieve the fair social norm either exactly in the 
presence of incentive problem or asymptotically in the absence of such incentive problem. 
Implementing or achieving such a norm in a decentralized way is impossible when 
iV — > oo limit. The KPR problem has similarity with the Minority Game Problem 
[5] as in both the games, herding behavior is punished and diversity's encouraged. Also, 
both involves learning of the agents from the past successes etc. Of course, KPR has 
some simple exact solution limits, a few of which are discussed here. In none of these 
cases considered here, learning strategies are individualistic; rather all the agents choose 
following the probability given by Eq. ([1]). In a few different limits of such a learning 
strategy, the average utilization fraction / and their distributions are obtained and 
compared with the analytic estimates, which are reasonably close. Needless to mention, 
the real challenge is to design algorithms of learning mixed strategies (e.g., from the 
pool discussed here) by the agents so that the fair social norm emerges eventually even 
when every one decides on the basis of their own information independently. As we have 
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seen, some naive strategies give better values of / compared to most of the "smarter" 
strategies like strict crowd-avoiding strategies (sec 2.2) etc. This observation in fact 
compares well with earlier observation in minority games (see e.g., [9]). 

It may be noted that all the stochastic strategies, being parallel in computational 
mode, have the advantage that they converge to solution at smaller time steps (~ \/N 
or weakly dependent on N) while for deterministic strategies the convergence time is 
typically of order of N, which renders such strategies useless in the truly macroscopic 
(N — > oo) limits. However, deterministic strategies are useful when N is small and 
rational agents can design appropriate punishment schemes for the deviators (see [6]). 

In brief, the study of the KPR problem shows that while a dictated solution leads 
to one of the best possible solution to the problem, with each agent getting his dinner 
at the best ranked restaurant with a period of N evenings, and with best possible 
value of / (= 1) starting from the first evening itself. The parallel decision strategies 
(employing evolving algorithms by the agents, and past informations, e.g., of n(t)), 
which are necessarily parallel among the agents and stochastic (as in democracy), are 
less efficient (/ <C 1; the best one discussed here in sec. 2.3, giving / ~ 0.8 only). We 
also note that most of the "smarter" strategies lead to much lower efficiency. 

Is there an upper bound for the value of utilization fraction / (less than unity; easily 
achieved in the dictated solution) for such stochastic strategies employed in parallel 
(democratically) by the agents in KPR? If so, what is this upper bound value? Also, 
what is the learning time required to arrive at such a solution (compared to zero waiting 
time to arriving at the most efficient dictated solution) in KPR? These are the questions 
are to be investigated in future. 
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Appendix A. Random-choice case 

Suppose there are XN agents and N restaurants. An agents can select any restaurant 
with equal probability. Therefore, the probability that a single restaurant is chosen by 
m agents is given by a Poission distribution in the limit N — > oo: 



Therefore the fraction of restaurants not chosen by any agents is given by A(m = 0) = 
exp(— A) and that implies that average fraction of restaurants occupied on any evening 
is given by pQ 





exp(— A) as N — > oo. 



(A.l) 



/ = 1 - exp(-A) ~ 0.63 for A = 1, 



(A.2) 



Kolkata Paise Restaurant Problem 



10 



in the KPR problem. 



Appendix B. Strict-rank-dependent choice 

In this gent goes to the k-th ranked restaurant with probability Pk{t) = kj k; 

that is, Pk{t) given by (CD) in the limit a — 1, T — Y oo. Starting with iV restaurants 
and N agents, we make N/2 pairs of restaurants and each pair has restaurants ranked 
k and N + 1 — k where 1 < k < N/2. Therefore, an agent chooses any pair of restaurant 
with uniform probability p = 2/N or N agents chooses randomly from N/2 pairs of 
restaurants. Therefore the fraction of pairs selected by the agents (from Eq. flA.ll) ) 

/o = 1 - exp(-A) — 0.86 for A = 2. (B.l) 

Also, the expected number of restaurants occupied in a pair of restaurants with rank k 
and N + 1 — k by a pair of agents is 

k 2 (N+l-k) 2 k(N + l-k) , s 

Therefore, the fraction of restaurants occupied by pairs of agents 

/i = ^ E ^^0.67. (B.3) 

k=l,...,N/2 

Hence, the actual fraction of restaurants occupied by the agents is 

/=/„./! -0.58. (B.4) 
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